NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Operating the 200 Gbps IRIS-HEP Demonstrator for ATLAS

https://doi.org/10.1051/epjconf/202533701061

Gardner_Jr, Robert W; Benjamin, Douglas; Bryant, Lincoln; Feickert, Matthew; Golnaraghi, Farnaz; Held, Alexander; Hu, Fengping; Jordan, David; Stephen, Judith; Vukotic, Ilija; et al (October 2025, EPJ Web of Conferences)
Szumlak, T; Rachwał, B; Dziurda, A; Schulz, M; vom_Bruch, D; Ellis, K; Hageboeck, S (Ed.)
The ATLAS experiment is currently developing columnar analysis frameworks which leverage the Python data science ecosystem. We describe the construction and operation of the infrastructure necessary to support demonstrations of these frameworks, with a focus on those from IRIS-HEP. One such demonstrator aims to process the compact ATLAS data format PHYSLITE at rates exceeding 200 Gbps. Various access configurations and setups on different sites are explored, including direct access to a dCache storage system via Xrootd, the use of ServiceX, and the use of multiple XCache servers equipped with NVMe storage devices. Integral to this study was the analysis of network traffic and bottlenecks, worker node scheduling and disk configurations, and the performance of an S3 object store. The system’s overall performance was measured as the number of processing cores scaled to over 2,000 and the volume of data accessed in an interactive session approached 200 TB. The presentation will delve into the operational details and findings related to the physical infrastructure that underpins these demonstrators.
more » « less
Full Text Available
Can You Mimic Me? Exploring the Use of Android Record & Replay Tools in Debugging

https://doi.org/10.1109/MOBILESoft66462.2025.00011

Song, Zihe; Mansur, S_M Hasan; Rathnasuriya, Ravishka; Fatima, Yumna; Yang, Wei; Moran, Kevin; Lam, Wing (April 2025, IEEE)

Full Text Available
ABCG2: A Milestone Charge Model for Accurate Solvation Free Energy Calculation

https://doi.org/10.1021/acs.jctc.5c00038

He, Xibing; Man, Viet H; Yang, Wei; Lee, Tai-Sung; Wang, Junmei (March 2025, Journal of Chemical Theory and Computation)

Full Text Available
“Can You Mimic Me? Exploring the Use of Android Record & Replay Tools in Debugging”

Song, Zihe; Mansur, S_M_Hasan; Rathnasuriya, Ravishka; Fatima, Yumna; Yang, Wei; Moran, Kevin; Lam, Wing (April 2025, Proceedings of 12th International Conference on Mobile Software Engineering and Systems)

Android User Interface (UI) testing has emerged as an important and prevalent research topic due to the ubiquity of apps and the unique challenges faced by developers in this software domain. One popular topic of research that aims to facilitate both manual and automated UI testing and debugging processes is record and replay (R&R) tools. These tools allow for the recording of UI actions to facilitate the execution of test scenarios and the replay of various types of bugs. R&R tools typically support three main settings: (i) UI regression testing via R&R of feature-based execution scenarios, (ii) R&R of non- crashing functional bugs (e.g., in crowdsourced settings), and (iii) R&R of crashing bugs. Despite the progress made in research related to R&R tools, prior work examined only the effectiveness of these tools in disparate or fragmented settings. As such, the research community currently lacks a comprehensive examination of the effectiveness of existing tools across their common use cases and the potential key limitations that emerge. We address this current gap in knowledge by conducting a thorough empirical study on using R&R tools to manually record and replay feature-based user scenarios, non-crashing failures, and crashing bugs. Additionally, we explore the possibility of using R&R tools in conjunction with automated input genera- tion (AIG) tools to automatically record and replay crashing bugs. Our study context includes one industrial and three academic R&R tools, 34 user scenarios from 17 apps, 90 non-crashing failures from 42 Android apps, and 31 crashing bugs from 17 Android apps. Our results illustrate that 17% of user scenarios, 38% of non-crashing failures, and 44% of crashing bugs are not able to be reliably recorded and replayed, with the most prevalent reasons for non-replayability being action interval resolution, incompatibility related to APIs, and limitations in Android tooling. Our findings reveal important research directions for R&R tools to facilitate their practical application and adoption.
more » « less
Full Text Available
Body Mass–Biomass Scaling Modulates Species Keystone‐Ness to Press Perturbations

https://doi.org/10.1111/ele.70086

Li, Xiaoxiao; Yang, Wei; Novak, Mark; Zhao, Lei; de_Ruiter, Peter C; Yang, Zhifeng; Guill, Christian (February 2025, Ecology Letters)

ABSTRACT Identifying species with disproportionate effects on other species under press perturbations is essential, yet how species traits and community context drive their ‘keystone‐ness’ remain unclear. We quantified keystone‐ness as linearly approximated per capita net effect derived from normalised inverse community matrices and as non‐linear per capita community biomass change from simulated perturbations in food webs with varying biomass structure. In bottom‐heavy webs (negative relationship between species' body mass and their biomass within the web), larger species at higher trophic levels tended to be keystone species, whereas in top‐heavy webs (positive body mass to biomass relationship), the opposite was true and the relationships between species' energetic traits and keystone‐ness were weakened or reversed compared to bottom‐heavy webs. Linear approximations aligned well with non‐linear responses in bottom‐heavy webs, but were less consistent in top‐heavy webs. These findings highlight the importance of community context in shaping species' keystone‐ness and informing effective conservation actions.
more » « less
Full Text Available
LLMEffiChecker : Understanding and Testing Efficiency Degradation of Large Language Models

https://doi.org/10.1145/3664812

Feng, Xiaoning; Han, Xiaohong; Chen, Simin; Yang, Wei (September 2024, ACM Transactions on Software Engineering and Methodology)

Large Language Models (LLMs) have received much recent attention due to their human-level accuracy. While existing works mostly focus on either improving accuracy or testing accuracy robustness, the computation efficiency of LLMs, which is of paramount importance due to often vast generation demands and real-time requirements, has surprisingly received little attention. In this article, we make the first attempt to understand and test potential computation efficiency robustness in state-of-the-art LLMs. By analyzing the working mechanism and implementation of 20,543 public-accessible LLMs, we observe a fundamental property in LLMs that could be manipulated in an adversarial manner to reduce computation efficiency significantly. Our interesting observation is that the output length determines the computation efficiency of LLMs instead of the input, where the output length depends on two factors: an often sufficiently large yet pessimistic pre-configured threshold controlling the max number of iterations and a runtime-generated end of sentence (EOS) token. Our key motivation is to generate test inputs that could sufficiently delay the generation of EOS such that LLMs would have to go through enough iterations to satisfy the pre-configured threshold. We presentLLMEffiChecker, which can work under both white-box setting and black-box setting. In the white-box scenario,LLMEffiCheckerdevelops a gradient-guided technique that searches for a minimal and unnoticeable perturbation at character-level, token-level, and structure-level. In the black-box scenario,LLMEffiCheckeremploys a causal inference-based approach to find critical tokens and similarly applies three levels of imperceptible perturbation to them. Both the white-box and black-box settings effectively delay the appearance of EOS, compelling these inputs to reach the naturally unreachable threshold. To demonstrate the effectiveness ofLLMEffiChecker, we conduct a systematic evaluation on nine publicly available LLMs: Google T5, AllenAI WMT14, Helsinki-NLP translator, Facebook FairSeq, UNICAMP-DL translator, MarianMT, Google FLAN-T5, MBZUAI LaMini-GPT, and Salesforce CodeGen. Experimental results show thatLLMEffiCheckercan increase on average LLMs’ response latency and energy consumption by 325% to 3,244% and 344% to 3,616%, respectively, by perturbing just one character or token in the input sentence. Our case study shows that inputs generated byLLMEffiCheckersignificantly affect the battery power in real-world mobile devices (i.e., drain more than 30 times battery power than normal inputs).
more » « less
Full Text Available
A Finite Element Method for Hyperbolic Metamaterials with Applications for Hyperlens

https://doi.org/10.1137/23M1591207

Liu, Fuhao; Yang, Wei; Li, Jichun (June 2024, SIAM Journal on Numerical Analysis)

Full Text Available
Atomic Ordering-Induced Ensemble Variation in Alloys Governs Electrocatalyst On/Off States

https://doi.org/10.1021/jacs.4c11753

Gong, Tianyao; Qiu, Guotao; He, Mo-Rigen; Safonova, Olga V; Yang, Wei-Chang; Raciti, David; Oses, Corey; Hall, Anthony Shoji (January 2025, Journal of the American Chemical Society)

Full Text Available
Automated Testing Linguistic Capabilities of NLP Models

https://doi.org/10.1145/3672455

Lee, Jaeseong; Chen, Simin; Mordahl, Austin; Liu, Cong; Yang, Wei; Wei, Shiyi (September 2024, ACM Transactions on Software Engineering and Methodology)

Natural language processing (NLP) has gained widespread adoption in the development of real-world applications. However, the black-box nature of neural networks in NLP applications poses a challenge when evaluating their performance, let alone ensuring it. Recent research has proposed testing techniques to enhance the trustworthiness of NLP-based applications. However, most existing works use a single, aggregated metric (i.e., accuracy) which is difficult for users to assess NLP model performance on fine-grained aspects, such as LCs. To address this limitation, we present ALiCT, an automated testing technique for validating NLP applications based on their LCs. ALiCT takes user-specified LCs as inputs and produces diverse test suite with test oracles for each of given LC. We evaluate ALiCT on two widely adopted NLP tasks, sentiment analysis and hate speech detection, in terms of diversity, effectiveness, and consistency. Using Self-BLEU and syntactic diversity metrics, our findings reveal that ALiCT generates test cases that are 190% and 2213% more diverse in semantics and syntax, respectively, compared to those produced by state-of-the-art techniques. In addition, ALiCT is capable of producing a larger number of NLP model failures in 22 out of 25 LCs over the two NLP applications.
more » « less
Full Text Available
DeciX: Explain Deep Learning Based Code Generation Applications

https://doi.org/10.1145/3660814

Chen, Simin; Li, Zexin; Yang, Wei; Liu, Cong (July 2024, Proceedings of the ACM on Software Engineering)

Deep learning-based code generation (DL-CG) applications have shown great potential for assisting developers in programming with human-competitive accuracy. However, lacking transparency in such applications due to the uninterpretable nature of deep learning models makes the automatically generated programs untrustworthy. In this paper, we develop DeciX, a first explanation method dedicated to DL-CG applications. DeciX is motivated by observing two unique properties of DL-CG applications: output-to-output dependencies and irrelevant value and semantic space. These properties violate the fundamental assumptions made in existing explainable DL techniques and thus cause applying existing techniques to DL-CG applications rather pessimistic and even incorrect. DeciX addresses these two limitations by constructing a causal inference dependency graph, containing a novel method leveraging causal inference that can accurately quantify the contribution of each dependency edge in the graph to the end prediction result. Proved by extensive experiments assessing popular, widely-used DL-CG applications and several baseline methods, DeciX is able to achieve significantly better performance compared to state-of-the-art in terms of several critical performance metrics, including correctness, succinctness, stability, and overhead. Furthermore, DeciX can be applied to practical scenarios since it does not require any knowledge of the DL-CG model under explanation. We have also conducted case studies that demonstrate the applicability of DeciX in practice.
more » « less
Full Text Available

« Prev Next »

Search for: All records